Handwritten Kazakh and Russian (HKR) database for text recognition

نویسندگان

چکیده

In this paper, we present a new Russian and Kazakh database (with about 95% of 5% words/sentences respectively) for offline handwriting recognition. A few pre-processing segmentation procedures have been developed together with the database. The is written in Cyrillic shares same 33 characters. Besides these characters, alphabet also contains 9 additional specific This dataset collection forms. sources all forms datasets were generated by \LaTeX which subsequently was filled out persons their handwriting. consists more than 1400 There are approximately 63000 sentences, 715699 symbols produced 200 different writers. It can serve researchers field recognition tasks using deep machine learning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sentence Boundary Detection for Handwritten Text Recognition

In the larger context of handwritten text recognition systems many natural language processing techniques can potentially be applied to the output of such systems. However, these techniques often assume that the input is segmented into meaningful units, such as sentences. This paper investigates the use of hidden-event language models and a maximum entropy based method for sentence boundary det...

متن کامل

Active Learning for Historic Handwritten Text Recognition

This thesis examines the use of active learning for the task of handwritten text recognition in historical documents. Active learning is a machine learning paradigm which enables the learner to select the data that is being trained on. In domains where procuring annotated data is expensive but there are large amounts of unlabelled data available, active learning can lead to better models with t...

متن کامل

Self-training for Handwritten Text Line Recognition

Off-line handwriting recognition deals with the task of automatically recognizing handwritten text from images, for example from scanned sheets of paper. Due to the tremendous variations of writing styles encountered between different individuals, this is a very challenging task. Traditionally, a recognition system is trained by using a large corpus of handwritten text that has to be transcribe...

متن کامل

Handwritten Text Recognition for Ancient Documents

Huge amounts of legacy documents are being published by on-line digital libraries world wide. However, for these raw digital images to be really useful, they need to be transcribed into a textual electronic format that would allow unrestricted indexing, browsing and querying. In some cases, adequate transcriptions of the handwritten text images are already available. In this work three systems ...

متن کامل

Handwritten Text Recognition for Historical Documents

The amount of digitized legacy documents has been rising dramatically over the last years due mainly to the increasing number of on-line digital libraries publishing this kind of documents. The vast majority of them remain waiting to be transcribed into a textual electronic format (such as ASCII or PDF) that would provide historians and other researchers new ways of indexing, consulting and que...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Multimedia Tools and Applications

سال: 2021

ISSN: ['1380-7501', '1573-7721']

DOI: https://doi.org/10.1007/s11042-021-11399-6